online computation and hashing
Reviews: Scalable Generalized Linear Bandits: Online Computation and Hashing
The paper identifies two problems with existing Generalized Linear Model Bandit algorithms: their per-step computational complexity is linear in t (time-steps) and N (number of arms). The first problem is solved by the proposed GLOC algorithm (first that achieves constant-in-t runtime and O(d T ½ polylog(T)) regret) and the second by a faster approximate variant (QGLOC) based on hashing. Minor contributions are a (claimed non-trivial) generalization of linear online-to-confidence bounds to GLM, an exploration-exploitation tradeoff parameter for QGLOC that finally justifies a common heuristic used by practitioners and a novel simpler hashing method for computing QGLOC solutions. I think the paper is interesting and proposes a novel idea, but the presentation is sometimes confused and makes it hard to evaluate the impact of the contribution w.r.t. the existing literature. This is clearly not the case in the linear setting when \mu is the identity, and closed forms for \hat{theta}_t allow the solution to be computed incrementally.
Scalable Generalized Linear Bandits: Online Computation and Hashing
Jun, Kwang-Sung, Bhargava, Aniruddha, Nowak, Robert, Willett, Rebecca
Generalized Linear Bandits (GLBs), a natural extension of the stochastic linear bandits, has been popular and successful in recent years. This paper proposes new, scalable solutions to the GLB problem in two respects. First, unlike existing GLBs, whose per-time-step space and time complexity grow at least linearly with time $t$, we propose a new algorithm that performs online computations to enjoy a constant space and time complexity. At its heart is a novel Generalized Linear extension of the Online-to-confidence-set Conversion (GLOC method) that takes \emph{any} online learning algorithm and turns it into a GLB algorithm. As a special case, we apply GLOC to the online Newton step algorithm, which results in a low-regret GLB algorithm with much lower time and memory complexity than prior work.